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■ Abstract 

(N 

Under a partially linear models we study a family of robust estimates for the regression 
' parameter and the regression function when some of the predictor variables take values on 

jjy . a Riemannian manifold. We obtain the consistency and the asymptotic normality of the 

proposed estimators. Also, we consider a robust cross validation procedure to select the 
smoothing parameter. Simulations and application to real data show the performance of 
our proposal under small samples and contamination. 
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^ ! 1 Introduction 

00 _ 

Partially linear regression models (PLM) assume that the regression function can be modeled 
linearly on some covariates, while it depends nonparametrically on some others. To be more 
precise, assume that we have a response yi G M and covariates (xj,^) such that Xj G IR p ,ti G 
[0, 1] satisfying 

T 

yi = (3 + g(U) + Si l<i<n, (1) 



T 

where the errors £j are independent and independent of (Xj Since the introductory work 
by [10], the partly linear models have become an important tool in the modeling of econometric 
or biometric data, since they combine the flexibility of nonparametric models and the simple 
interpretations of the linear ones. However, in many applications, the predictors variables 
take values on a Riemannian manifold more than on Euclidean space and this structure of the 
variables needs to be taken into account in the estimation procedure. 



In a recent work (see [TT] ) , we studied a PLM when the explanatory variables takes values 
on a Riemannian manifold and we explored the potencial of this model in an applications of 
an environment problem. Unfortunately, as we will see in Section [U this approach may not 
work as desired because PLM can be very sensitive to the presence of a small proportion of 
observations that deviate from the assumed model. One way to avoid this problem is to derive 
robust estimators to fit PLM models that can resist the effect of a small number of atypical 
observations. The goal of this paper is to introduce resistant estimators for the regression 
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parameter and the regression function under PLM (pQ), when the predictor variable t takes 
values on a Riemannian manifolds. 

This paper is organized as follows. In Section [21 we give a brief summary of the classical 
proposal of estimation for this model and we introduce the robust estimates. In Section [3j 
we study the consistency and the asymptotic distribution of the regression parameter under 
regular assumptions on the bandwidth sequence. A robust cross validation method for the 
bandwidth selection is considered in Section HJ Section [5] include a simulation study in order 
to explore the performance of the new estimators under normality and contamination. We 
show an example using real data, in Section [6] Proofs are given in the Appendix. 



2 The model and the estimators 
2.1 Classical estimators 

Assume that we have a sample of n independent variables (?/j,x^,ij) in IR P+1 x M with 
identically distribution to (y,x , t), where (M, 7) is a Riemannian manifold of dimension d. 
The partially linear model assume that the relation between the response variable and the 

T 

covariates (xj , tj) can be represented as 

Vi = ^ (3 + g(ti) + et l<i<n, (2) 

T 

where the errors £j are independent and independent of (x i , tj) and we will assume that e 
has symmetric distribution. Denote 4>o(t) = E(y\t = r) and (j>(t) = ,cp p (t)) where 

T 

</>j(r) = E(xij\t = r) for 1 < j < p, then we have that g(t) = <po(t) — cf)(t) (3 and hence, 

T ^ 

y — 4>o(t) = (x — 4>(t)) (3 + e. The classical least square estimator of (3, (3i s can be obtained 
by minimizing 

n 

Pis = argmin - 0<y a (*i)) - (*i - 4>i 8 (U)) P? 

with (f>Q is and cf)i s are nonparametric kernel estimators of 4>q and <f>(t), respectively. More 
precisely, the nonparametric estimators 0o,is arid <fijj s of 4>q and can be defined as (see [2T]). 

n n 

(poM 1 ) = ^2 w n,h(t,U)yi and (f>jjs(t) = ^2w n , h (t,t i )x ij (3) 
j=l 1=1 

where w nyh (t,ti) = ^ 1 (t i )^(d 7 (t, ti)/^n)/[Efc=i OtHt k )K(d y (t, t k ) /K)]' 1 with if : M -)• 2R 
a non-negative function, cL the distance induced by the metric 7, ^t(s) the volume density 
function on (M, 7) and the bandwidth /i n is a sequence of real positive numbers such that 
linin^oo h n = and h n are smaller than the injectivity radius of (M, 7) (injyM ). As in [13] 
we consider (M, 7) a d— dimensional compact oriented Riemannian manifold without boundary. 
Note that in this case inj^M > . For a rigorous definition of the volume density function 
and the injectivity radius see [1] or [13]. 
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The final least square estimator of g can be taken as gi s {t) = 4>oj s {t) — <f>i s {t) fli s . The 
properties of these estimators have been studied in [11] and in the case of Euclidean data have 
been widely studied in the literature, see for example [10], [22], [9] and [18] . 

2.2 Robust estimates 

As in the Euclidean setting, the estimators introduced by [21] are a weighted average of the 
response variables, these estimates are very sensitive to large fluctuations of the variables and 
so, the final estimator of (3 can be seriously affected by anomalous data, as mentioned in the 
Introduction. To overcome this problem, [13] considered two families of robust estimators for 
the regression function when the explanatory variables tj take values on a Riemannian manifold 
(M, 7) . The first family combines the ideas of robust smoothing in Euclidean spaces with the 
kernel weights introduced in |21j . The second generalizes to our setting the proposal given by 
[5], who considered robust nonparametric estimates using nearest neighbor weights when the 
predictors t are on IR d . 

Based on the robust nonparametric estimators proposed in [13] and the ideas considered in 
[2] for the partially linear models in the Euclidean cases, we proposed a class of estimates based 
on a three-step robust procedure under the partly linear model when some of the predictors 
takes values on a Riemannian manifolds. The three-step robust estimators are defined as 
follows: 

Step 1: Estimate 4>j(t), < j < p through a robust smoothing. Denote by <j)j,R the 
obtained estimates and <pji(t) = (</>i,R,(i), • • • , 4> p ^(t)) T . 

Step 2: Estimate the regression parameter by applying a robust regression estimate to 
the residuals in — (j>o,R(U) and Xj — R (ij). Denote by /3 R the obtained estimator. 

Step 3: Define the robust estimate of the regression function g as g^it) = </>o,r(£) — 
3r$r(*)- 

Note that in the Step 1, the regression functions correspond to predictors taking values in 
a Riemannian manifold. Local M— type estimates <^>o, r and <j>j,K are defined in [13] as the 
solution of 

± Wn . h (t,m( m -^f ) )=0 and ±w n Mt,U)* fa-*ff ) )=0 (4) 

respectively, where the score function ^ is strictly increasing, bounded and continuous and 
°o,n(T) and Oj n(r) 1 < j < p are local robust estimates. 

Possible choice for the score function \E' can be the Huber or the bisquare \I/-function. The 
local robust scale estimates o"o jn (r) and <Jj,n(f) 1 < j < V can be taken as the local median 
of the absolute deviations from the local median (local MAD), i.e. the MAD (see [H]) with 
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respect to the distributions 

n n 

Fn(y\t = T) = ^2w n ,h( T ^i) I (-^y](y i ) and F j>n{A t = T ) = ^2 w n,h( T > ^(-oo.a:] ( x ij)- 
i=l i=l 

(5) 

respectively. 

In the Step 2, the robust estimation of the regression parameter can be performed by apply- 
ing to the residuals any of the robust methods proposed for linear regression. For example, we 
can consider M-estimates (|15j) and GM-estimators ([19]). On the other hand, high breakdown 
point estimates with high eficiency as MM-estimates could be evaluated ([26] and |27J). 

We consider /3 R the solution of 

n 

5>1 ((^ " *fc 3ll)An) Wl (II^H) ft = 0, (6) 
i=l 

with s n a robust consistent estimate of a £ , = Vi — 0o,R,(*i)> = x i — 0r(^)> V'i a score 
function and u?i a weight function. The zero of this equation can be computed iteratively using 
reweighting, as described for the location setting in [[20], Chapter 2]. 

The estimator defined by [21] corresponds to the choice "*$?(u) = u with the estimators of 
the conditional distribution based on kernel weights defined in (j^j). Therefore, if we considered 
the least square estimators of f3 in the Step 2 , we obtain the classical estimators proposed in 
[llj . On the other hand, when (M, 7) is IR d endowed with the canonical metric, the estimation 
procedure reduces to proposal introduced in [2]. Details over the procedure to computing the 
robust nonparametric estimators in the Step 1 can be found in |13j . 

3 Asymptotic results 

The theorems of this Section study the asymptotic behavior of the regression parameter esti- 
mator of the model under standard conditions. Let U be an open set of M, we denote by C k (U) 
the set of k times continuously differentiable functions from U to JR. As in [21], we assume 
that the image measure of P by t is absolutely continuous with respect to the Riemannian 
volume measure and we denote by / its density on M with respect to v~. 

Let o"o(r) and Oj(r) for 1 < j < n be the mad of the conditional distribution of yi\t = r 
for j = and x\j\t = t for 1 < j < n. 

3.1 Consistency 

To derive strong consistency result of the estimate (3 R of f3 defined in Step 2 , we will consider 
the following set of assumptions. 

HI. : IR — > IR is an odd, strictly increasing, bounded and continuously differentiable 
function, such that u^'(u) < *$>(u) for u > 0. 
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H2. F{y\t = t) and Fj(x\t = r) are symmetric around (j)o( T ) an d 4'j( T ) an d there are contin- 
uous functions of y and x for each r. 

H3. Mq is a compact set on M such that: 

i) The density function / of t, is a bounded function such that inf re M /( r ) = A> 0. 

ii) inf 9 T (s) = B > 0. 

rSAf 

seM 

_ff4. The following equicontinuity condition holds 

Ve > 0, 35 > : \z - z'\ < 5 => sup \G s (z) - G s (z')\ < e 

seM 

for the functions G s (z) equal to F(z\t = s) and Fj(z\t = s) for 1 < j < p. 
H5. For any open set Uo of M such that Mq C Uq, 

i) / is of class C 2 on Uo- 

ii) F(y\t = t) and Fj(x\t = r) are uniformly Lipschitz in Uq, that is, there exists a 
constant C > such that |G r (z) — G s (z)| < C c? 9 (r, s) for all t,s £ Uq and 2 S iR, 
for the functions G s (z) equal to F(z\t = s) and Fj(z\t = s) for 1 < j < p. 

HQ. K : IR IR is a bounded nonnegative Lipschitz function of order one, with compact 
support [0, 1] satisfying J Rd uK (||u||)du = and < J Rd ||u|| 2 K(||u||)(iu < oo. 

H7. The sequence h n is such that h n — > and nh^/logn — > oo as n — > oo. 

H8. The estimator ctj^t) of <Xj(t) satisfy <Jj, n ( T ) ~^ a j( T ) as n — )• oo for all r £ Mq and 
< j < p. 



Remark 13.11 1. Assumption HI is a standard condition in a robustness framework. The fact 
that 6 s (s) = 1 for all s £ M guarantees that H3 holds for a small compact neighborhood of s. 
HA and H5 are needed in order to derive strong uniform consistency results. Assumption HQ is 
a standard assumption when dealing with kernel estimators. It is easy to see that Assumption 
H8 is satisfied, when we consider <Tj jn (r) as the local median of the absolute deviations from 
the local median. 

Theorem 13. 11 1. Under the hypothesis HI to H8 , we have that 

a) |3 R - (3\ ^ 0. 

b) sup reM() \(j r (t) - g(r)\ ^> 0. 
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3.2 Asymptotic distribution 

In this Section, we assume that in the Step 2 of the estimation procedure, the choice for /3 R 
is given as in ([6]). More precisely, let tp\ be a score function and w\ be a weight function, we 
will derive the asymptotic distribution of the regression parameter estimates /3 R defined as a 
solution of 

n 

X^i (in - »fc 3r)/s„) w 1 (\\?j i \\)rj i = 0, 

i=l 

with s n a robust consistent estimate of a £ , fj = yi — </>o,r(*i)) Vi = x « ~~ 0r(^)- Denote by 
ryj = Xj - and r { = y t - o (*i)- Note that r i -r] i f3 = e*. 

To derive the asymptotic distribution of the regression parameter estimates, we will need 
the following set of assumptions. 

Al. ipi is an odd, bounded and twice continuously differentiate function with bounded 
derivatives tp^ and ifj'{, such that the functions uip'^u) and uip'{(u) are bounded. 

A2. £7(101(11^11) m\tx) = 0, ^idlmll) ||7h|| 2 ) < oo and A = £M(e/<7 B )«>i(||»hll) 1i1i T ) 
is non singular. 

^43. The function wi(u) is bounded, Lipschitz of order 1. Moreover, (p(u) = w\{u)u is also 
a bounded and continuously differentiable function with bounded derivative <p'(u) such 
that utp'(u) is bounded. 

A4. The functions (j>j(t) for < j < p are continuous with continuous in M. 

A5. <fij(t) the estimates of (j>j(t) for < j < p have first continuous derivatives in M and 

n 1 / 4 sup fo(t) - 0,-(t)| -A 0, for < j < p, (7) 
sup |V0j(£) - V4>j(t)\ 0, for < j < p. (8) 

where V£ corresponds to the gradient of £ with £ E J~(M) and J-{M) the class of 
functions {£ G C X (M) : Halloo < 1 [IV^Hoo < 1}. 

^46. The estimator s n of a e satisfies s n — ^->- <r e as n — >• oo. 

Theorem 13.21 1. Under the assumptions Al to Ad we have that 

vH§ R - £) A AT(0, a'A-^A" 1 ), 
where A is defined in A2 and S = E(ip\ {e/a E ))E{w\ »7i»7i )• 

Remark 13.21 1. To proof the previous result, we will need an inequality for the covering 
number of J-{M). The Appendix include some results related to the covering number on a 
Riemannian manifold. 
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4 Bandwidth Selection 



To select the smoothing parameter there exist two commonly used approaches: L 2 cross- 
validation and plug-in methods. However, these procedures may not be robust. Their sen- 
sitivity to anomalous data was discussed by several authors, see for example [T7], [28], [6], 
[8] and [IB]. Under a nonparametric regression model with carriers in an Euclidean space for 
spline-based estimators, [8] introduced a robust cross-validation criterion to select the band- 
width parameter. Robust cross-validation selectors for kernel M-smoothers were considered in 
|17j . [28| and [16], under a fully nonparametric regression model. In the Euclidean setting, for 
partially linear model, a robust cross-validation criterion in the cases of autoregression models 
was considered in [3J , while a robust plug- in procedure was studied in [7] . When the variables 
belong in a Riemannian manifold, a robust cross validation procedure was discussed in [13] un- 
der a fully nonparametric regression model, while a classical cross-validation procedure under 
a partly linear models was considered in |llj . 

We included a robust cross-validation method for the choice of the bandwidth in the case of 
partially linear models that robustified the proposal given in [11]. The robust cross-validation 
method constructs an asymptotically optimal data-driven bandwidth, and thus adaptive data- 
driven estimators, by minimizing 

n 

Rcv(h) = J2* 2 ((y* - 0o,-i,/»(*<)) - (* - 4>-i, h m T P)> 
i=i 

where ^ is a bounded score function as the Huber's function, <\>_ i h (t) = ((^-^(t), . . . , P) _j^(t)) 
and (fto,-i,h(t) denote the robust nonparametric estimators computed with bandwidth h using 
all the data expect the i— th observation and j3 estimate the regression parameter by applying 
a robust regression estimate to the residuals — <^o,-i,ft.(^i) and Xj — 4>_ i h(U). 

The asymptotic properties of data-driven estimators require further careful investigation 
and are beyond the scope of this paper. 

5 Simulation study 

In this section, we consider a simulation study designed to evaluate the performance of the 
robust procedure introduced in Section [2j The main objective of this study is to compare 
the behavior of the classical and robust estimators under normal samples and contamination. 
We consider the cylinder endowed with the metric induced by the canonical metric of IR 3 . 
Because of the computational burden of the robust procedure, we performed 500 replications 
of independent samples of size n = 200. In the smoothing procedure, the kernel was taken as 
the quadratic kernel K(t) = (15/16)(1 - t 2 ) 2 I(\x\ < 1) and we choose the bandwidth using 
the robust cross validation procedure described in Section [4] for the robust estimators and the 
classical cross validation described in [11] for the classical estimators. The distance c? 7 and the 
volume density function for the cylinder were computed in [14] and [15] . We considered the 
following model: 
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The variables (yi,Xi,ti) for 1 < i < n were generated as 

yi = 2 Xi + (tu + t 2i - hi) 2 + £j and Xi = sin(2f 3i ) + rji 

where U = (tu, t 2 i, t^) = (cos(0j), sin(0j), Sj) with the variables 0j follow a uniform distribution 
in (0, 2ir) and the variables Sj are uniform in (0,1), i.e. ij have support in the cylinder with 
radius 1 and height between (0, 1). 

The non contaminated cases that denoted with Cq corresponds to the errors £j and r\i 
are i.i.d. normal with mean and standard deviation 1 and 0.05, respectively. Besides, 
the so-called contaminations C\ and C 2 , which correspond to selecting a distribution in a 
neighborhood of the central normal distribution, are defined as e ~ 0.9iV(0, 1) + 0.1iV(0, 25) 
and e ~ 0.9iV(0, l)+0.1iV(5, 0.25), respectively. The contamination C\ corresponds to inflating 
the errors and thus, will affect the variance of the regression estimates. 

5.1 Simulation results 

Table [5jl shows the mean, standard deviations, mean square error for the regression estimates 
of (5 and the mean of the mean square error of the regression function g over the 500 replica- 
tions for the considered model. We denote with Is and R the classical and robust estimators, 
respectively. Figure [5)1 shows the boxplot of the regression parameter. 





mean(/3;J 


sd(AJ 


MSE(AJ 


MSE(g ls ) 


Co 


2.0732 


0.1445 


0.0262 


0.2396 


Ci 


1.8789 


1.7592 


3.1095 


20.4485 


c 2 


1.8722 


1.7975 


3.2475 


45.9719 




mean(/3 R ) 


sd(/3 R ) 


MSE(/3 R ) 


MSE(g R ) 


C 


2.0646 


0.1433 


0.0247 


0.2431 


Ci 


2.0198 


0.2303 


0.0534 


0.4897 


c 2 


2.0109 


0.2540 


0.0646 


1.3580 



Table[5]l: Performance of regression parameter and the regression functions under the different contaminations. 

The simulation results confirm the inadequate behavior of the classical estimators under 
the considered contaminations. The robust estimators of the regression function introduced in 
this work showing only a small lack of efficiency under normality. In both cases, the results 
obtained with the classical estimators are not reliable giving high mean square errors that 
those corresponding to the robust procedure, under CI and C2, respectively. This extreme 
behavior of the classical estimators show its inadequacy when one suspects that the sample 
can contain outliers. 
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a) 



b) 



T 




Figure 01: Boxplot of a) /3 ls the classical estimators and b) /3 R the robust estimators under the different 
contaminations. 



6 Real Example 

The solar insolation is the amount of electromagnetic energy or solar radiation incident on 
the surface of the earth. This variable measures the duration of sunlight in seconds. In the 
automatic stations, the World Meteorological Organization defines insolation as the sum of 
time intervals in which the irradiance exceeds the threshold of 120 watts per square meter. 
The irradiance is direct radiation normal or perpendicular to the sun on Earth's surface. The 
values of the insolation in a particular location depend of the of the weather conditions and the 
sun's position on the horizon. For example, the presence of clouds increases the absorption, 
reflection and dispersion of the solar radiation. Desert areas, given the lack of clouds, have the 
highest values of insolation on the planet. More details about insolation can be seen in [3]. 

As we comment above, the isolation is related with the weather conditions. In particlar, 
to illustrate the proposed estimators, we will analyze the relation between the insolation, 
the humidity, the direction and the speed of the wind. We consider a data set available in 
|http: / / meteo.navarra.es / [ This data consists on the daily average of relative humidity, speed 
and direction of the wind and the insolation. The direction's wind was measure with the 
point zero in the north direction and the wind's speed was measure in meter per second. The 
data was measure daily in the automatic meteorologic station of Pamplona-Larrabide GN, 
in Navarra, Spain during the year 2004. In our study, we consider a random sample of this 
dataset. 

In Figure 2, we can see that the humidity and the insolation follows a lineal relation less 
in the outlieres contained in the ellipse. Therefore, we consider a partially lineal model to 
explain the insolation, as a linear function of the humidity and a non parametric function of 
the speed and direction of the wind. Note that, the variables corresponding to the wind to 
be modeled nonparametrically, belong to a cylinder. In the smoothing procedure, we consider 
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the quadratic kernel K(t) = (15/16) (1 - t 2 ) 2 I(\x\ < 1) and we select the bandwidth using the 
robust cross validation procedure for the robust estimators and the classical cross validation 
described in [TT] for the classical estimators. 




50 60 70 60 50 

hi.r*-' 



Figure [6] 1: Scatterplot between the insolation and humidity. The dots and the asterisks correspond to the 
original sample. The triangles correspond to the two outliers introduced instead of the asterisks. 

In a first step, we apply the classical and robust methods to obtain an estimator of 
the regression parameter using all the data. The results were /3 is = —1032.869 and /3 R = 
—1246.856. Also, based in the asymptotic results obtained in Theorem 3.2.1, we calculated 
a confidence interval with level 0.05 in each case. To computed these confidence intervals, 
we estimated the unknown quantities. The result of the classical confidence interval was 
ClomiP) = (—1229.9451 — 835.7935) and the confidence interval based in the robust estima- 
tion was CTo.o5(/3) = (—1453.658, —1040.053). On the other hand, we calculated the classical 
estimator using the data exept the outliers, the result was (3i s = —1294.620 and its confidence 
interval was C/o.os(/3) — (—1502.983, —1086.257). Thus, if we estimate the regression parame- 
ter with the classical approach when the dataset have outliers, the conclussion can be different. 
In the classical case with all the data, the hypothesis that /3 = —1000 is rejected, while the 
conclusions with the classical estimator without the outliers or the robust estimators with all 
the data does not reject the null hypothesis. 
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A Appendix 



A.l Proof of Theorem SHH. 

a) Denote by n = y» - <£ ,R.(*i), = x « ~ 4>r(U), Vi = x i ~ and n = m - <f>o(U). 

We note that n = rj i P + ei and let P n (^4) = ^ Si=i ^(ri, f)j). It is well known that the 
robust regression estimates can be written as a functional of the empirical distribution. More 
precisely, (3 = (3{P n ) where is continuous at P, the common distribution of (ri,rjJ) T . 
Therefore, it is suffice to prove that Tl(P n ,P) where II stands for the Prohorov distance. 
Thus, we will show that for any bounded and continuous function / : IR P+1 — > M we have that 
\ E f-E P \^0. 

J Ti 

Note that 

1 n 

\E r f - E P f\ < |/(rj + (MU) ~ Mti)),Vi + (<f>(ti) ~ <KU))) ~ f(ri,rh)\Ic(r i} rii,U) 

1=1 

1 71 

ry-l * ^ 



n . 
i=i 



where C\ C M p+1 and M C M are compact sets such that for any e > P(C) > 1-6/(411/1100) 
with C = Ci x M . 

Under the assumptions by Theorem 3.3 of [13], we have that 



sup \4>j,K{t) - <pj,n,{t)\ ^ 

t£M 

for < j < p. From this fact and the Strong Law of Large Numbers, we have that there exists 
a set NcH such that P(fl) = and that for any uj H we obtain that 



1 n 

-J2lc<r uVi ,U)^P(C c ). 



n . 
i=i 



Let C\ the closure of a neighborhood of radius 1 of Ci.The uniform continuity of / on C\ 
implies that there exists 5 such that maxi<j< p+ i \uj — Uj|, u,v € C\ entails \f(u) — f(v)\ < §. 
Thus, we have that for ui ft and n large enough maxo<j< p sup tgMo \4>j(t) — <fij(t)\ < 5 and 
then, for 1 < i < n, we obtain that 

\f{n + (Mk) - Mti)),Vi + md - $(*»))) - f(n, < \- 

that conclude the proof, 
b) □ 

A. 2 Entropy number 

The main objective of this Section is to obtain an upper-bound to the entropy number of the 
class of functions 7{M) = e C l (M) : < 1 ||V£||oo < 1}- The covering number 
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N(5,J-, || • ||) is the minimal number of balls, {£ : ||£ — rj\\ < 5} of radius 5 needed to cover 
the set J- . The entropy number is the logarithm of the covering number. This upper-bound 
will be use to obtain the asymptotic distribution of the regression parameter. Several authors 
were studied bounds to the covering numbers for different sets, see for example [23], [23] and 
|25| . In particular, [25] obtained an upper-bound to the covering number for J-{M) when M 
is a bounded, convex subset of IR d . For the convenience of the reader, we have included the 
following remark (see [12]). 

Remark [All. Let N{5) be the minimal number of balls with radius 5 needed to cover (M, 5). 
A (5-filling is a maximal family of pairwise disjoint open balls of radius 5. We denote by D(5), 
the packing number, i.e. the maximum number of such balls. Is easy to see that N(25) < D(S). 
Let diamiM,i) ^ e the diameter of (M, 7) and consider k G M such that Ricc( M ^ > (d — 1)k 
where Ricc^M-y) is the Ricci curvature and d the dimension of M. For example, if 7 is an Einstein 
metric's with scalar curvature 2{d — \)k then the inequality is attained. Note that if n > 
since Myers's Theorem [12], (M, 7) is necessary a compact manifolds with diam^M,^) — k/t/k. 
Since M is compact there exists n with this property. Denote by V K (r) the volume of a ball of 
radius r in a complete, simply connected Riemannian manifold with constant curvature n. By 
the Theorem of Bishop (see [12] ) we know that is a non increasing function where 

B(x, r) = {z G M : d^(x, z) < r} is the geodesic ball centered in x with radius r. Note that, M 
is the closure of B(x,diamr Mrj \) for any x G M. If {B(a\, |), . . . ,B(cld, |)} with D = -D(|) 
is a |— filling then, 

5 < FoZ(M) < V K {diam {Mn) ) 

{ T - inf 1 < i < D y |( J B(a 4 ,|)) " " 

Therefore iV(<5) < C(diam,(M^, K)5~ d . 

Lemma Ell. Let F{M) = {( e C X (M) : ||f H^, < 1 ||Vf ||oo < 1}, then the covering 
number for the supremum norm of J~(M) that we denote by N(5,J-(M), \\ ■ ||oo) satisfies that 
log N(S,T(M),\\ ■ || 00 )< A8~ d . 

Proof of Lemma [All. Let A = {B(a\, 5), . . . , B(cin, 5)} be a covering of M by open balls of 
radius 5. By the remark above, we may assume that N < C(diam^ Ml ^, n)5~ d . Also, we can 
choose the covering A such that B(a,i,6) R B(a,i+i,5) 7^0forl<i<A r — 1 and 7^ aj for 



1 < i,j < N. Let £ G F(M), we define the function £ = Y*=\ 8 ^ ^ where D 1 = B(a u 5), 
Di = B(a,i,5)\ UjC\ B(aj, 5) and [a] denotes the integer part of a. 

Let x G M and 1 < k < AT such that x G Da,-) then we have that \£(x) — £(x)\ < \£(x) — 
€(a fc )| + |e(o fc ) - Since £(a fc ) = £(x) and £(a fc ) = £(a fe ) + <5(« - [^]) = £(a fc ) + 55 

with < B < 1, we have that - £(x)| < 25. 

For the first value £(ai) of a generic function £, we have [4] + 1 possibilities. Since, 

|£(a fc ) - e(a fe _!)| < |£(a fc ) - £(a fc )| + |^(a fc ) - £(a fc _i)| + |£(a fc _i) - £(a fc _i)| < 45. 

Therefore, for each value of £(0^-1) we can choose 9 possibilities for £(afc). Then is easy to 
verify that 

N(28,T(M), || -|U) <([!] + 1)9". 
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which finish the proof. □ 

Remark [5J2. Since N(5, T(M), L 2 (Q)) < N(5,T(M),\\ ■ W^) then Lemma[All entails that 
the covering number of F(M) satisfies, log N(5, T(M), L 2 (Q)) < A5~ d . 

A.3 Proof of Theorem [Ql. 

Using a Taylor expansion around /3 R we have that S n = A n (/3 R — (3) where 

1 n T 

Sn = ~J2^((ri-Vi P)/sn)w 1 (\\rj i \\)r] i 

i=l 

1 n T ~ 

An = -ZlV'l ((^-^i3)/Sn)^l(||T7i|l)^J- 
i=l 

where (3 is an intermediate point between (3 and /3 R . Analogous arguments to those used in 
Lemma 2 in [2] allow to show that A n — ^4 where A is defined in A2. 

Since ^ Si=i V'l {til&e) w\ (||»7j||) is asymptotically normally distributed with covari- 
ance XI, it will enough to show that 

1 n 

y/E [S n -- y £ih^ i /8 n )w 1 (\\T li \\) Vi ]^0, (9) 
n 

, n , n 

[-£ (*/*n) 77, - - £ lh (hill) fid -A 0. (10) 



n r - : n . 

i=i i=i 

We first prove Q. Using a Taylor expansion of order two, we have that the following decom- 
position. 

n 5 

y/n[S n - -£Vi {ei/s n )wi (H^H) ^ = J2 S ™ 
n i=i i=i 

where 



= — E^i(^An)[7 T (ii)/3-7o(^)]^i(ll^ll)^ 
n r— r 

i=i 

/ — n 

Sn2 = ^^£^i(ei/sn)K(||T7ill)»7i-^i(||r7j)»7i] 
n i=i 
/ — n 



5n.3 = £[</>i (rj - r)l(3/sn) - ih. {ei/s n )]wi (||»y ) % - f/J 

i=l 

/ — n 

^ = |^E^i(^n)[7 T (ti)/3- %(ti)]V(|fall)»fc 

j=l 
/ — n 

= (^A-) [T T (*i)/9 - (ll^ll) - ^1 (ll^ll)]^ 



j=l 
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where Jj(t) = 4>j(t) — <fij(t) for < j < n and j(t) = (71, . . . , 7 n ). By ^43, A5 and A6 is easy 
to see that \\Si n \\ for i = 3,4,5. 



Let 

/ — n 



n 



n 
1=1 



i=l 



n E a ( r -^^)[wi (h + €ii) en* + - «* (ii^ii) (n^] 



2=1 

Therefore, it remains to show that {s n , %) and J\n{ s n,l) for < j, s < p. 
From now on, we will omitted the superscript j for the sake of simplicity. 

Let T(M) = {£ G C 1 (M) : \\C\\oo < 1 W\\oo < 1} and consider the classes of functions 

Ti = {h^(r, V ,t) a G [a £ /2, 2a £ ) £ G T(M)} 

= {f2,a,^V,t) a€(a £ /2,2a £ ) £ = {&,... ,t P ), £.€ ?(M)} 

Note that, the independence of Si and (xj, ti), A2 and the fact that the errors e have symmetric 
distribution imply that E(f(ri,rj i ,ti)) = for any / G T\ U Ti. As in [2], it is easy to see that 
the covering number of the classes T\ and Ti satisfy 

N{Cie,Ti,L 2 {Q)) < N(e,T(M),L 2 (Q)) N(e, (a £ /2,2a £ ), \ ■ \) 

N(C 2 e,T 2 ,L 2 (Q)) < N^{e,T{M),L 2 {Q)) N(e,(a £ /2,2a £ ),\ ■ \) 

where Q is any probability measure. Since Remark lAl 2. the covering number of T(M) satisfies 
that logiV(e,-7 r (M),L 2 (g)) < Ae~ d . Therefore, we get that these classes have finite uniform- 
entropy. For < 5 < 1, consider the subclasses T\ § and Ti & oiT\ and T 2 respectively, defined 
by 

T ljS = {/ G T\ i G T(M), Halloo < S} 

T 2 ,8 = {f£T 2 £ = (&,...,&), t B £HM), \%\\oo<6} 

For any e > 0, let < 5 < 1 since A5 and A6 we obtain that for n large enough P(s n G 
(a £ /2,2a £ )) > 1-5/2 and P(% G T(M) and H^Hoc < 5) > 1 - 6/2 for < s < p. 

Then, the maximal inequality for covering numbers entails that for < s < p 
P{\Jin(s n ,ls)\ > e) < P(\Jln(s n ,%)\ > q s n G (<r e /2, 2o £ ); 7, G T(M) and H^H^ < <5) + <5 



< P sup 



i=l 



>e\+S 
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< -E sup 

< -g{8,Tx) + 5 
e 

where Q{8, F) = supg Jq ^Jl + log N (e\\F\\Q t 2, F , L 2 (Q))de, then the fact that F\ satisfies the 
the uniform-entropy conditions we get that lim^o Q{5, F\) = 0, therefore S\ n — > 0. Similarly 
for J2n{sn,l) and the class Fi and we get that Sin 0. 

The proof of f j 1 [) . follows using analogous arguments that those considered in Q.D 
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